Rare coding variants in CHRNB2 reduce the likelihood of smoking

Human genetic studies of smoking behavior have been thus far largely limited to common variants. Studying rare coding variants has the potential to identify drug targets. We performed an exome-wide association study of smoking phenotypes in up to 749,459 individuals and discovered a protective association in CHRNB2, encoding the β2 subunit of the α4β2 nicotine acetylcholine receptor. Rare predicted loss-of-function and likely deleterious missense variants in CHRNB2 in aggregate were associated with a 35% decreased odds for smoking heavily (odds ratio (OR) = 0.65, confidence interval (CI) = 0.56–0.76, P = 1.9 × 10−8). An independent common variant association in the protective direction (rs2072659; OR = 0.96; CI = 0.94–0.98; P = 5.3 × 10−6) was also evident, suggesting an allelic series. Our findings in humans align with decades-old experimental observations in mice that β2 loss abolishes nicotine-mediated neuronal responses and attenuates nicotine self-administration. Our genetic discovery will inspire future drug designs targeting CHRNB2 in the brain for the treatment of nicotine addiction.

Your Article, "Rare coding variants in CHRNB2 reduce the likelihood of smoking" has now been seen by 3 referees. You will see from their comments below that while they find your work of interest, some important points are raised. We are interested in the possibility of publishing your study in Nature Genetics, but would like to consider your response to these concerns in the form of a revised manuscript before we make a final decision on publication.
To guide the scope of the revisions, the editors discuss the referee reports in detail within the team with a view to identifying key priorities that should be addressed in revision. In this case, we think all three referees have provided constructive reviews aimed at strengthening the analyses and improving the presentation, and we particularly ask that you address their comments as thoroughly as possible with appropriate revisions. We hope that you will find the prioritized set of referee points to be useful when revising your study.
We therefore invite you to revise your manuscript taking into account all reviewer and editor comments. Please highlight all changes in the manuscript text file. At this stage we will need you to upload a copy of the manuscript in MS Word .docx or similar editable format.
We are committed to providing a fair and constructive peer-review process. Do not hesitate to contact us if there are specific requests from the reviewers that you believe are technically impossible or unlikely to yield a meaningful outcome.
When revising your manuscript: *1) Include a "Response to referees" document detailing, point-by-point, how you addressed each referee comment. If no action was taken to address a point, you must provide a compelling argument. This response will be sent back to the referees along with the revised manuscript.
*2) If you have not done so already please begin to revise your manuscript so that it conforms to our Article format instructions, available <a href="http://www.nature.com/ng/authors/article_types/index.html">here</ a>. Refer also to any guidelines provided in this letter.
*3) Include a revised version of any required Reporting Summary: https://www.nature.com/documents/nr-reporting-summary.pdf It will be available to referees (and, potentially, statisticians) to aid in their evaluation if the manuscript goes back for peer review. A revised checklist is essential for re-review of the paper.
Please be aware of our <a href="https://www.nature.com/natureresearch/editorial-policies/image-integrity">guidelines on digital image standards.</a> Please use the link below to submit your revised manuscript and related files: [redacted] <strong>Note:</strong> This URL links to your confidential home page and associated information about manuscripts you may have submitted, or that you are reviewing for us. If you wish to forward this email to co-authors, please delete the link to your homepage.
We hope to receive your revised manuscript within three to six months. If you cannot send it within this time, please let us know.
Please do not hesitate to contact me if you have any questions or would like to discuss these revisions further.
Nature Genetics is committed to improving transparency in authorship. As part of our efforts in this direction, we are now requesting that all authors identified as 'corresponding author 'on published papers create and link their Open Researcher and Contributor Identifier (ORCID) with their account on the Manuscript Tracking System (MTS), prior to acceptance. ORCID helps the scientific community achieve unambiguous attribution of all scholarly contributions. You can create and link your ORCID from the home page of the MTS by clicking on 'Modify my Springer Nature account '. For more information please visit please visit <a href="http://www.springernature.com/orcid">www.springernature.com/orci d</a>.
We look forward to seeing the revised manuscript and thank you for the opportunity to review your work. This is the first well powered rare variant association study of smoking behaviour. The study is very thorough; it includes different samples (including individuals from non-European ancestry) and six (primary) smoking phenotypes, it combines information on rare and common variants, replicates the main finding in an isolated population, and includes several follow-up analyses. Overall, the manuscript is written up well and has a good balance between describing findings and interpretations. The analytical approaches seem valid and the conclusions robust. The authors found a protective association of CHRNB2 rare variants (plus significant associations implicating 2 other genes), and they showed convergence of rare and common variant results in CHRNB2. This represents the first human genetic evidence supporting the hypothesis that loss of CHRNB2 protects people against nicotine dependence, and hence could be useful in discovering new therapeutics. The manuscript should be highly cited and fits well within the target journal.
I do not have any major comments, other than that the new GSCAN paper has just come out online with GWAS results from a substantially larger sample (>3M individuals), see https://www.nature.com/articles/s41586-022-05477-4#MOESM3. I think the authors will have to integrate these results in their manuscript and analyses (for example when calculating PGSs they should use the summary level data from this manuscript instead of the older GSCAN paper).
-'However, human genetic studies of smoking behavior have so far focused mainly on common variants (those observed in more than 1% of the population)15-17. 'While this has been the case before, many GWASs do consider variants with a MAF below 1% (depending on the GWAS sample size). The GSCAN paper by Liu et al for example also looked at variants with a MAF between 0.1 and 1%, and in the new GSCAN paper that just came out (see above), they even included variants with a lower MAF than 0.1%. Reviewer #2: Remarks to the Author: Summary of the key results Rajagopal et al. performed an association study of multiple smoking phenotypes based on whole-exome sequencing data from multiple cohorts. They mainly focused on the rare variants in the coding regions and identified a signal in CHRNB2 in the aggregation analyses. They also presented data on common variants in this region and other reported regions. They proposed that CHIP mutations explained another two signals in ASXL1 and DNMT3A. This is an association study with large sample size. However, I have some concerns about the novelty of the results. In addition, some methods were not clearly described.
Originality and significance: if not novel, please include reference 1. The major finding of this study is a genetic signal in CHRNB2. However, the CHRNB2 region has been reported in a previous GWAS (PMID: 30643251), as mentioned by the authors. I agree with the authors that the rare coding variants may contribute greatly to the susceptibility of phenotypes, but the novelty of the exome-wide screening will be greatly diminished if the region has been reported.
2. Only CHIP mutations of ASXL1 and DNMT3A were associated with smoking behaviors; that is interesting. The authors propose their hypotheses (selective advantage); however, they do not present additional supporting data. They should look at the variant allele fraction (VAF) of these mutations significantly associated with smoking phenotypes.
Data & methodology: validity of approach, quality of data, quality of presentation 3. I do not see any methods for the analysis of CHIP mutations. How are these mutations defined? Also, I do not find the mean coverage of the whole-exome sequencing (not the percentage of 20X region); coverage might greatly influence the number of CHIP mutations. Is there any difference across different cohorts? 4. The signal of CHRNB2 was associated with the phenotype of heavy smoker in UKB/GHS/MCPS. The readers may also want to know the association between CHRNB2 variants and the phenotype of ever smoker in the SINAI study.
5. The authors provide a single P value for analyses based on multiple cohorts. What is the combined strategy? Direct combination or metaanalysis? If direct combination, the author may need to describe how to correct the batch effects. If meta-analysis, they may need to show the results of reported signals in each cohort.
6. Page 7, line 23. The effect of CHRNB4 pLOF is different from that in Supplementary Table 10. What is the mean of "Beta=0.65 SD"? Appropriate use of statistics and treatment of uncertainties 7. No statistical tests were performed in the 'Interplay between common and rare variants 'section. Thus, any conclusions in this section may not be appropriate.
8. Page 5, line 33-34. No description of the methods for the enrichment analyses based on the data from the FinnGen project.
Conclusions: robustness, validity, reliability 9. Page 10, line 17-19. "However, to the best of our knowledge, what we describe here is the first human genetic evidence supporting the hypothesis that loss of CHRNB2 protects against nicotine addiction." This is not true. As mentioned by the authors themselves, Liu et al. have already presented the data of the CHRNB2 region (rs2072659), as in Supplementary Table. Suggested improvements: experiments, data for possible revision 10. Because smoking is the leading risk factor of lung cancer, the genetic variants associated with smoking commonly contribute to the risk of developing lung cancer (e.g., 15q25.1). Are rare CHRNB2 variants associated with lung cancer risk in UKB data? 11. In previous GWAS, more than 400 loci were reported to be associated with smoking. What about the associations for the rare variants in these loci?
12. Heavy-smokers were defined as who smoked 10 or more cigarettes per day. Additional sensitivity analysis should be performed to test the robustness of the definition, as this definition was different from other studies, which used pack-years of 20/30 to define heavy smokers.
References: appropriate credit to previous work? 13. As whole-genome sequencing is gradually implemented in the population study (PMID: 32499645, 32581362, 33589841, 36113475), the authors may also discuss the limitation of the population study based on whole-exome sequencing.
Clarity and context: lucidity of abstract/summary, appropriateness of abstract, introduction and conclusions 14. Abstract. "α4β2 is the predominant nAChR in human brain and is one of the targets of varenicline, a partial nAChR agonist/antagonist used to aid smoking cessation." This is not the result of this association study. The description here is misleading.
Reviewer #3: Remarks to the Author: The manuscript by Rajagopal and colleagues describes a comprehensive ExWAS of smoking for rare variants in up to 749,459 individuals across multiple ancestries. The study leveraged off several well-known cohorts including UK biobank, GHS, MCPS, and SINAI. They used 6 primary smoking phenotypes for their analysis. The main finding is that rare coding variants in CHRNB2 is protective for smoking. The authors also identified two other genes, ASXL1, DNMT3A that reached ExWAS significance and determined that these genetic signals were mainly due to the CHIP variants. The relationship between common, as well as PGS, and rare variants of the CHRNB2 with smoking were also explored.
There is a prior publication from the TOPMED study that described the contribution of rare variants to smoking based on WGS (Jang et al, PMID: 35927319). As the authors stated, that study was not able to pinpoint to any specific gene. This study is novel and add to our current knowledge of genetics associated with smoking behavior by analyzing rare variants.
Their findings also supported by animal-based studies which showed protective effects of CHRNB2 in nicotine dependence. Validation of a known 3'UTR SNP in CHRNB2 with this study also support the validity of CHRNB2 as a protective gene. This study supports CHRNB2 as a potential therapeutic target for smoking cessation and addition and the availability of known drugs targeting this important class of proteins.
Although the manuscript is well written, revision of the text to be more precise will improve the clarity and help the readers: 1. It probably better to move the supplemental fig 1 to the main text. The total N of subjects from each study included in this study should be indicated. 2. Add total N to the "Case counts" and "Control counts" for Fig 1b for each gene. 3. The section "Associations of rare and common variants in CHRNB2"in Result should be in two sections describing rare and common variants separately. 4. Long paragraph in the discussion maybe better in two paragraphs.

Author Rebuttal to Initial comments
Rare coding variants in CHRNB2 reduce the likelihood of smoking Revision summary We thank the editors and reviewers for reviewing our manuscript and offering valuable comments and feedback. We have revised our manuscript as per the reviewers' recommendations. Some of the major revisions are summarized below: Updated PGS analysis using new scores generated using the latest GWAS of smoking behavior by Saunders et al, Nature 2022. Sensitivity analysis by defining heavy smoking in different ways and demonstrating the consistency of the protective association of CHRNB2 rare variants with heavy smoking. Association analysis variant allele fraction (VAF) of CHIP mutations with smoking phenotypes Rare variant burden analysis focusing only on the known GWAS loci Statistical tests for the PGS analysis Below, we list a point-by-point response to reviewers' comments and a detailed description of the analysis and revisions we made to the manuscript.
Response to the reviewers' comments Reviewer #1: This is the first well powered rare variant association study of smoking behaviour. The study is very thorough; it includes different samples (including individuals from non-European ancestry) and six (primary) smoking phenotypes, it combines information on rare and common variants, replicates the main finding in an isolated population, and includes several follow-up analyses. Overall, the manuscript is written up well and has a good balance between describing findings and interpretations. The analytical approaches seem valid and the conclusions robust. The authors found a protective association of CHRNB2 rare variants (plus significant associations implicating 2 other genes), and they showed convergence of rare and common variant results in CHRNB2. This represents the first human genetic evidence supporting the hypothesis that loss of CHRNB2 protects people against nicotine dependence, and hence could be useful in discovering new therapeutics. The manuscript should be highly cited and fits well within the target journal.
We thank the reviewer for this positive overview of our study.
I do not have any major comments, other than that the new GSCAN paper has just come out online with GWAS results from a substantially larger sample (>3M individuals), see https://www.nature.com/articles/s41586-022-05477-4. I think the authors will have to integrate these results in their manuscript and analyses (for example when calculating PGSs they should use the summary level data from this manuscript instead of the older GSCAN paper).
We thank the reviewer for this suggestion. We have now updated the PGS analysis using a new polygenic score that was based on larger training data (N=~500k). We obtained the latest GSCAN summary statistics 1 (excluding UK Biobank and 23andme) and meta-analyzed with our GHS cohort to boost the sample size further (UKB cannot be included because it's the target sample, and MCPS cannot be included as it predominantly includes admixed Americans individuals). The new PGS yielded more statistical power in terms of visualizing the prevalence of heavy smokers across polygenic quintiles. We have further added new statistical analyses as per the request of Reviewer 2 (see our response to comment 15).
We thank the reviewer for pointing to this reference. We agree that nicotine dependence has a higher twin heritability than smoking initiation. We have updated the text as below and updated the citation to Vink et al 2005 2 . "Smoking behavior is strongly influenced by genetics, with twin-based heritability estimates ranging between 45% (for smoking initiation) and 75% (for nicotine dependence)" 'However, human genetic studies of smoking behavior have so far focused mainly on common variants (those observed in more than 1% of the population)15-17.' While this has been the case before, many GWASs do consider variants with a MAF below 1% (depending on the GWAS sample size). The GSCAN paper by Liu et al for example also looked at variants with a MAF between 0.1 and 1%, and in the new GSCAN paper that just came out (see above), they even included variants with a lower MAF than 0.1%.
We agree with the reviewer that the sentence about MAF threshold selection in previous GWASs needs clarity. In a GWAS, researchers often restrict the analysis to only variants with MAF>1% as the imputation accuracy drops below that cut-off. However, with the recent increase in the GWAS sample sizes, researchers have started looking beyond MAF 1% cut-off, as the large sample size compensates for the loss of statistical power due to low imputation accuracy. For example, in the recent GWAS by Saunders et al. 1 the reviewer is referring to, the authors studied genetic variants with MAF up to 0.1% but chose only those variants with an effective sample size (actual sample size x imputation accuracy) of at least 10% of the study's maximum sample size. Although this approach helps to study some variants with MAF between 0.1% to 1% with moderate imputation accuracy, it still misses many variants within this range that cannot be imputed confidently but can only be sequenced directly. It is challenging to describe these nuances to the reader briefly in the background section. At the same time, we agree with the reviewer that it's not correct to bluntly say that past GWASs have focused only on common variants as that's not entirely true. So, we have now updated the text below to add clarity on the MAF range we are referring to.
"… Genetic variants across the full minor allele frequency (MAF) spectrum-common (MAF>1%), low-frequency (MAF 0.1% -1%), and rare variants (MAF<0.1%)-contribute to this high heritability. However, human genetic studies of smoking behavior have so far focused mainly on common and low-frequency variants (that can be imputed with at least a moderate accuracy). …" Methods: provide N when describing the different cohorts (page 11) We have reported in detail the sample sizes for the six smoking phenotypes for individual cohorts and ancestries in Supplementary Table 1. We have mentioned the same as quoted below in the manuscript, under the section titled "Exome-wide significant associations".
"The study cohorts and phenotype definitions are described in Methods, and the cohort-specific sample sizes and participant demographics are summarized in Supplementary Tables 1 and 2 respectively." We couldn't define heavy smoking phenotype in the SINAI cohort as we didn't have information about the number of cigarettes smoked per day or smoking pack years. However, we have reported CHRNB2 pLOF plus missense burden association with ever-smoker phenotype in Supplementary figure 5b. Rajagopal et al. performed an association study of multiple smoking phenotypes based on wholeexome sequencing data from multiple cohorts. They mainly focused on the rare variants in the coding regions and identified a signal in CHRNB2 in the aggregation analyses. They also presented data on common variants in this region and other reported regions. They proposed that CHIP mutations explained another two signals in ASXL1 and DNMT3A. This is an association study with large sample size. However, I have some concerns about the novelty of the results. In addition, some methods were not clearly described.
Originality and significance: if not novel, please include reference The major finding of this study is a genetic signal in CHRNB2. However, the CHRNB2 region has been reported in a previous GWAS (PMID: 30643251), as mentioned by the authors. I agree with the authors that the rare coding variants may contribute greatly to the susceptibility of phenotypes, but the novelty of the exome-wide screening will be greatly diminished if the region has been reported.
We highly value the past GWAS results and believe they can significantly help prioritize gene targets in the exome studies. But we disagree with the reviewer's view that the novelty of our finding is "greatly diminished" because of a prior report on the association of common variants near CHRNB2. We note that the GWAS locus near CHRNB2 was among the >400 loci identified by Liu et al (2019) 3 , and the gene CHRNB2 is one among the hundreds of genes mapped to the GWAS loci and reported in the supplementary tables. Furthermore, in the recent GWAS by Saunders et al (2022) 1 , the CHRNB2 locus is one among the >2000 loci and CHRNB2 among >700 genes listed in the supplementary tables. Despite the well-known mechanistic links of CHRNB2 with smoking addiction supported by numerous mouse studies, the gene name does not appear anywhere in the manuscript by Liu et al and Saunders et al, except in the supplementary table. That is in fact a pitfall of GWASs because they point to hundreds of genes and the most valuable findings get lost in the crowd. Rare variant studies such as ours help identify such important findings from the GWAS. In our analysis, we identified for the first-time significant associations between functional coding variants in CHRNB2 with smoking behavior. This finding directly implicates CHRNB2 as the causal gene and it also informs about the direction of the association, which corroborates with past animal studies 4,5 . Even though the link between CHRNB2 and smoking is itself not novel (it's been known since the 1990s), we emphasize that our finding demonstrating the protective effect of the loss of CHRNB2 loss on smoking behavior in humans is indeed novel. If anything, our findings greatly complement the previous GWAS findings and highlight the importance of studying both GWAS and ExWAS findings together as we have discussed in the paper. So, we believe that neither the past GWASs diminish the value of our study nor ours undermine the value of past GWASs.
Only CHIP mutations of ASXL1 and DNMT3A were associated with smoking behaviors; that is interesting. The authors propose their hypotheses (selective advantage); however, they do not present additional supporting data. They should look at the variant allele fraction (VAF) of these mutations significantly associated with smoking phenotypes.
As per the reviewer's request, we have tested the associations of VAF in CHIP mutations in the eight most recurrent CHIP genes with the smoking phenotypes in a merged dataset of CHIP mutation carriers in the UKB (N=28,348) and GHS (N=11,063) cohorts. We aggregated the VAF estimates for CHIP mutations within each (and across all) of the eight genes and tested their associations with smoking phenotypes through regression analysis adjusted for age, sex, first 10 genetic PCs and a dummy variable for the cohort of origin. After correcting for multiple testing (FDR 1%), only the associations of VAF aggregated across all the genes with ever-smoker and heavy-smoker remained statistically significant. This is expected because the sample size drops when studying the VAF of individual genes. Looking at the nominally associated genes, the strongest association was seen for ASXL1 VAF with ever-smoker and heavy-smoker. In addition, we also observed a few nominal associations for TET2, TP53 and SRSF2. We report these results in Supplementary Table 10 and Supplementary Fig 11 (shown below). We added a short description of these results to the text under the title 'Associations of CHIP mutations in ASXL1 and DNMT3A' in the results and added a paragraph in the methods (shown as tracked changes). And we have now deleted the text quoted below, as our confidence in the specificity of smoking associations with ASXL1 and DNMT3A CHIP has reduced in the light of new nominal associations of smoking with VAF of TET2 and other CHIP genes. We thank the reviewer for their suggestion to look at the VAF associations.
Deleted text: "Notably, certain highly recurrent CHIP driver genes such as TET2 did not show significant associations with any of the six smoking phenotypes. This suggests that smoking influences the evolution of CHIP mutations through mechanisms that affect not all but only a specific set of genes" Data & methodology: validity of approach, quality of data, quality of presentation I do not see any methods for the analysis of CHIP mutations. How are these mutations defined? Also, I do not find the mean coverage of the whole-exome sequencing (not the percentage of 20X region); coverage might greatly influence the number of CHIP mutations. Is there any difference across different cohorts?
Detailed descriptions of CHIP mutation calling in the UKB and GHS cohorts and quality control procedures can be found in our recent publication 6 focused exclusively on the genetics of CHIP. Since our primary focus in the current paper is not CHIP, we opted to mention the CHIP-related findings briefly and redirect interested readers to our primary CHIP paper. Since the reviewers have requested, we have now added a section in the method, titled "CHIP mutation analysis", briefly describing the CHIP mutation calling, genetic and VAF association analysis of CHIP mutations (shown as track changes).
Regarding coverage, as part of the standard operating procedure, we always exclude samples with less than 80% coverage of target regions with at least 20x depth. And when evaluating the samples included in the final analysis, on average we capture greater than 95% of the target regions with at least 20x sequencing depth with often most of the participants (>98%) having more than 90% coverage. We use this measure because it provides a more accurate picture of genomic coverage than the mean. Cohortspecific sequencing metrics can be found in our primary exome-sequencing publications of the participating cohorts 7-11 . The CHIP mutation associations are mainly driven by UK Biobank and GHS participants which contain older participants. We did not see any major differences in the CHIP calling between the two cohorts which we have investigated extensively in CHIP focused paper 6 .
The signal of CHRNB2 was associated with the phenotype of heavy smoker in UKB/GHS/MCPS. The readers may also want to know the association between CHRNB2 variants and the phenotype of ever smoker in the SINAI study.
We have reported the association of CHRNB2 pLOF-plus-missense burden with ever-smoker in the SINAI cohort in Supplementary Fig. 5b.
The authors provide a single P value for analyses based on multiple cohorts. What is the combined strategy? Direct combination or meta-analysis? If direct combination, the author may need to describe how to correct the batch effects. If meta-analysis, they may need to show the results of reported signals in each cohort.
We performed ExWAS and GWAS within each of the cohorts separately and then meta-analyzed the results using an inverse-variance weighted approach using the METAL software. We have now revised the text in the methods under the section 'Genetic association analysis' accordingly as quoted below.
"Genetic association analyses were done within each of the cohorts separately using REGENIE software and the results were then meta-analyzed together using an inverse-variance weighted approach using METAL software. …" As per the reviewer's request, we have now updated supplementary table 4 to show association statistics of all the significant variant and burden associations for individual cohorts and meta-analysis (in the first version of the manuscript only meta-analysis results were reported).
Page 7, line 23. The effect of CHRNB4 pLOF is different from that in Supplementary Table 10. What is the mean of "Beta=0.65 SD"?
We apologize for the discrepancy between the values in the text and the table. The effect size in the text was based on conditional analysis, which was done to test if there is a statistically significant association beyond the known common variant signals. To avoid confusion, we now changed the effect sizes in the text to reflect the raw effect sizes from the ExWAS that were reported in the supplementary table. We revised the text as follows.
"Notably, the largest effect size was observed for the CHRNB4 pLOF-only rare variant burden where the 13 pLOF carriers smoked on average ~6.8 fewer cigarettes per day more compared to P=0.008). This effect size is ~3 to 4 times larger than the largest effect size observed for CHRNA5 (Beta=0.23;P=0.01) and CHRNA3 (Beta=0.16;P=0.03) pLOF-only rare variant burden and ~7.5 times larger than rs16969968 (~1 cigarette more; Beta=0.09;

, a wellcharacterized common risk variant at this locus"
The quantitative phenotype cig-per-day was scaled to have a mean zero and standard deviation one to express the association effect sizes in terms of the number of SDs (one SD is approximately 10 cigarettes per day). So, an effect size of 0.68 SD would mean 6.8 cigarettes (0.68 x 10 cigarettes) difference between carriers and non-carriers.

Appropriate use of statistics and treatment of uncertainties
No statistical tests were performed in the 'Interplay between common and rare variants' section. Thus, any conclusions in this section may not be appropriate.
As per the reviewer's request, we have now added statistical analyses to this section. First, we analyzed the associations of PGS and CHRNB2 rare variant burden with heavy smoking using a logistic regression model that included an interaction term between the two and appropriate covariates. We report the odds ratio, 95% confidence intervals and P value for the association of CHRNB2 rare variant burden with heavy smoking (OR=0.66; 95% CI=0.56-0.79; P=3.4e-6) and the beta coefficient, standard error and P value for the association of PGS with heavy smoking (beta=0.33; SE=0.004; P=1e-300) and P value for the interaction term between the two (P=0.71), which was not significant as expected, corroborating past reports that show that the effects of rare and common variants are mostly independent of each other and additive. Second, we performed a quintile analysis where we divided the UKB participants into 5 equal groups and visualized the prevalence of heavy smoking in each of the groups separately in carriers and non-carriers of CHRNB2 rare variant burden. To make a statistical comparison of the smoking prevalence between carriers and non-carriers within each quintile, we did logistic regression analysis within each group testing the association of CHRNB2 rare variant burden with heavy smoking after adjusting for appropriate covariates. We report the odds ratio, 95% confidence interval and P value for each of the five groups in the main figure 4 (shown below). We have revised the manuscript sections accordingly shown as track changes (text under the title 'interplay between common and rare variants' in results and text under the title 'polygenic score analysis' in methods) Page 5, line 33-34. No description of the methods for the enrichment analyses based on the data from the FinnGen project.
We have now added a section titled "FinnGen analysis" in the methods describing the enrichment analysis we performed in the Finngen results as quoted below.

"We downloaded the associations of variant (rs202079239) with 3095 disease endpoints in the
FinnGen database using their web browser (https://r7.finngen.fi/variant/1-154575801-C-G). Through string search, we extracted associations related to smoking, substance abuse, addiction, COPD and other lung diseases. To test for enrichment of protective associations (OR<1) in the extracted phenotypes, we did a hypergeometric test using the 'phyper 'function implemented in the R base package by passing the following values: q=36 (number of associations with OR<1 among the smoking-related phenotypes), m=2018(number of associations with OR<1 among all the phenotypes), n=1077(number of associations with OR>1 among all the phenotypes) and k=47 (total number of smoking-related phenotypes extracted)." Conclusions: robustness, validity, reliability Page 10, line 17-19. "However, to the best of our knowledge, what we describe here is the first human genetic evidence supporting the hypothesis that loss of CHRNB2 protects against nicotine addiction." This is not true. As mentioned by the authors themselves, Liu et al. have already presented the data of the CHRNB2 region (rs2072659), as in Supplementary Table. We have addressed this under comment 9. Liu et al 3 only identified a common variant near CHRNB2 and this does not conclusively prove that CHRNB2 is the causal gene driving the association. It also does not provide information about the direction of the association, i.e., if increased or decreased gene function is linked to decreased smoking. On the other hand, our findings of protective associations between functional coding variants in CHRNB2 with smoking directly implicate CHRNB2 and support the hypothesis that loss of CHRNB2 protects against nicotine addiction. Moreover, we have appropriately cited Liu et al (and now also, Saunders et al) and mentioned in the article that we identified the common variant signal searching through the results from Liu et al.

Suggested improvements: experiments, data for possible revision
Because smoking is the leading risk factor of lung cancer, the genetic variants associated with smoking commonly contribute to the risk of developing lung cancer (e.g., 15q25.1). Are rare CHRNB2 variants associated with lung cancer risk in UKB data? 'Lung cancer' and 'family history of lung cancer' are among the many secondary phenotypes that we studied for associations with our main hits, and we have reported the results in Supplementary table 5. We did observe protective effect sizes for CHRNB2 pLOF-plus-missense burden; however, they did not reach statistical significance (Lung cancer: OR=0.85; CI=0.48-1.5; P=0.57; family history of lung cancer: OR=0.83; CI=0.69-1.0; P=0.06).
In previous GWAS, more than 400 loci were reported to be associated with smoking. What about the associations for the rare variants in these loci?
We did investigate the rare variant associations at the GWAS loci mapped by Liu et al. and found no significant associations beyond CHRNB2. But we didn't report it in our earlier draft. Motivated by the reviewer's comment, we have now repeated this analysis using the latest GWAS results by Saunders et al 1 . We studied the rare variant burden associations of two sets of genes: genes mapped to all the smoking-related GWAS loci and 'high-priority genes' i.e., genes mapped to GWAS loci with less than 5 fine-mapped variants as reported by Saunders et al. Unfortunately, as in the previous analysis, this new analysis did not find any significant associations (FDR 1%) beyond CHRNB2. We now report this in the manuscript. We have added a paragraph under the section 'Association of rare variants at known GWAS loci' and included a supplementary figure (shown below) visualizing the results.
Heavy-smokers were defined as who smoked 10 or more cigarettes per day. Additional sensitivity analysis should be performed to test the robustness of the definition, as this definition was different from other studies, which used pack-years of 20/30 to define heavy smokers.
We agree with the reviewer that the definition of heavy smoking varies from study to study. The reason why we chose the cut-off of ≥ 10 cigarettes per day is to align with the heavy-smoker phenotype in the MCPS which was defined as cig-per-day ≥ 10. At the time of the analysis, we did not have information about cigarettes per day and so couldn't redefine the heavy-smoker phenotype in the MCPS cohort. As per the reviewer's request, we have now performed a sensitivity analysis testing the consistency of the protective association of CHRNB2 pLOF-plus-missense burden across different definitions of heavysmoker. We defined heavy-smoker in the UKB using four definitions: cig-per-day ≥ 10 (the definition used in the paper), cig-per-day ≥ 20, pack-years ≥ 20 and pack-years ≥ 30 and performed rare variant burden analysis for each of the definitions. As shown in the forest plot below, we observe significant protective associations with CHRNB2 pLOF-plus-missense burden across all definitions.
We have included this plot as supplementary figure 4 in the manuscript and added a line in the main results section as follows.
"The protective association of CHRNB pLOF-plus-missense burden with heavy smoking was observed irrespective of how we define heavy smoking ( Supplementary Fig 4." References: appropriate credit to previous work? 13. As whole-genome sequencing is gradually implemented in the population study (PMID: 32499645, 32581362, 33589841, 36113475), the authors may also discuss the limitation of the population study based on whole-exome sequencing.
Our major goal in this project is to identify potential drug targets by using a well-proven successful formula: WES at scale. We are still skeptical about the value of WGS in drug target discovery over WES. However, we now mention that using WES to study drug targets may miss noncoding variation as a limitation in the discussion.
"Finally, we have focused only on the coding regions of the genome captured via whole exome sequencing (WES) and so, we may have missed rare variants with large effects on smoking behavior residing in noncoding regulatory regions. With the recent increase in large-scale whole genome sequencing (WGS) efforts, rare large-effect regulatory variants influencing human diseases and traits are being discovered and such discoveries may have the potential to lead to drug targets. However, the question of whether WGS is a more cost-effective investment than WES for drug target discovery is yet to be answered." Clarity and context: lucidity of abstract/summary, appropriateness of abstract, introduction and conclusions Abstract. "α4β2 is the predominant nAChR in human brain and is one of the targets of varenicline, a partial nAChR agonist/antagonist used to aid smoking cessation." This is not the result of this association study. The description here is misleading.
We have now removed these lines from the abstract.
Reviewer #3: The manuscript by Rajagopal and colleagues describes a comprehensive ExWAS of smoking for rare variants in up to 749,459 individuals across multiple ancestries. The study leveraged off several well-known cohorts including UK biobank, GHS, MCPS, and SINAI. They used 6 primary smoking phenotypes for their analysis. The main finding is that rare coding variants in CHRNB2 is protective for smoking. The authors also identified two other genes, ASXL1, DNMT3A that reached ExWAS significance and determined that these genetic signals were mainly due to the CHIP variants. The relationship between common, as well as PGS, and rare variants of the CHRNB2 with smoking were also explored.
There is a prior publication from the TOPMED study that described the contribution of rare variants to smoking based on WGS (Jang et al, PMID: 35927319). As the authors stated, that study was not able to pinpoint to any specific gene. This study is novel and add to our current knowledge of genetics associated with smoking behavior by analyzing rare variants. Their findings also supported by animal-based studies which showed protective effects of CHRNB2 in nicotine dependence. Validation of a known 3'UTR SNP in CHRNB2 with this study also support the validity of CHRNB2 as a protective gene. This study supports CHRNB2 as a potential therapeutic target for smoking cessation and addition and the availability of known drugs targeting this important class of proteins.
We thank the reviewer for summarizing and highlighting the major findings of our study.
Although the manuscript is well written, revision of the text to be more precise will improve the clarity and help the readers: It probably better to move the supplemental fig 1 to the main text. The total N of subjects from each study included in this study should be indicated.
As per the reviewer's request, we have now moved the study design figure to main figure 1. Regarding sample size, we have provided cohort-specific and ancestry-specific sample sizes for the six smoking phenotypes in Supplementary Table 1. Add total N to the "Case counts" and "Control counts" for Fig 1b for each gene.
In the forest plot, the Case counts and Control counts provide sample size broken down to wild type, heterozygotes and homozygotes for the minor allele. The total sum of these numbers will correspond to the total number of cases and controls which will be the same for all the genes and can be found in Supplementary Table 1. The section "Associations of rare and common variants in CHRNB2"in Result should be in two sections describing rare and common variants separately.
As per the reviewer's request, we have now split the rare and common variant associations in the results into two sections.
Long paragraph in the discussion maybe better in two paragraphs.
We have now split the second long paragraph of discussion into two.

Decision Letter, first revision:
31st Mar 2023 Dear Dr. Coppola, Thank you for submitting your revised manuscript "Rare coding variants in CHRNB2 reduce the likelihood of smoking" (NG-A61332R). It has now been seen by the original referees and their comments are below. The reviewers find that the paper has improved in revision, and therefore we'll be happy in principle to publish it in Nature Genetics, pending minor revisions to comply with our editorial and formatting guidelines.
We are now performing detailed checks on your paper and will send you a checklist detailing our editorial and formatting requirements soon. Please do not upload the final materials and make any revisions until you receive this additional information from us.
Thank you again for your interest in Nature Genetics Please do not hesitate to contact me if you have any questions. smoking" has been accepted for publication in an upcoming issue of Nature Genetics.
Over the next few weeks, your paper will be copyedited to ensure that it conforms to Nature Genetics style. Once your paper is typeset, you will receive an email with a link to choose the appropriate publishing options for your paper and our Author Services team will be in touch regarding any additional information that may be required.
After the grant of rights is completed, you will receive a link to your electronic proof via email with a request to make any corrections within 48 hours. If, when you receive your proof, you cannot meet this deadline, please inform us at rjsproduction@springernature.com immediately.
You will not receive your proofs until the publishing agreement has been received through our system.
Due to the importance of these deadlines, we ask that you please let us know now whether you will be difficult to contact over the next month. If this is the case, we ask you provide us with the contact information (email, phone and fax) of someone who will be able to check the proofs on your behalf, and who will be available to address any last-minute problems.
Your paper will be published online after we receive your corrections and will appear in print in the next available issue. You can find out your date of online publication by contacting the Nature Press Office (press@nature.com) after sending your e-proof corrections. Now is the time to inform your Public Relations or Press Office about your paper, as they might be interested in promoting its publication. This will allow them time to prepare an accurate and satisfactory press release. Include your manuscript tracking number (NG-A61332R1) and the name of the journal, which they will need when they contact our Press Office.
Before your paper is published online, we shall be distributing a press release to news organizations worldwide, which may very well include details of your work. We are happy for your institution or funding agency to prepare its own press release, but it must mention the embargo date and Nature Genetics. Our Press Office may contact you closer to the time of publication, but if you or your Press Office have any enquiries in the meantime, please contact press@nature.com.
Acceptance is conditional on the data in the manuscript not being published elsewhere, or announced in the print or electronic media, until the embargo/publication date. These restrictions are not intended to deter you from presenting your data at academic meetings and conferences, but any enquiries from the media about papers not yet scheduled for publication should be referred to us.
Please note that <i>Nature Genetics</i> is a Transformative Journal (TJ). Authors may publish their research with us through the traditional subscription access route or make their paper immediately open access through payment of an article-processing charge (APC). Authors will not be required to make a final decision about access to their article until it has been accepted. <a href="https://www.springernature.com/gp/open-research/transformative-journals"> Find out more about Transformative Journals</a> Authors may need to take specific actions to achieve <a href="https://www.springernature.com/gp/open-research/funding/policy-compliance-faqs"> compliance</a> with funder and institutional open access mandates. If your research is supported by a funder that requires immediate open access (e.g. according to <a href="https://www.springernature.com/gp/open-research/plan-s-compliance">Plan S principles</a>) then you should select the gold OA route, and we will direct you to the compliant route where possible. For authors selecting the subscription publication route, the journal's standard licensing terms will need to be accepted, including <a href="https://www.nature.com/nature-portfolio/editorialpolicies/self-archiving-and-license-to-publish. Those licensing terms will supersede any other terms that the author or any third party may assert apply to any version of the manuscript.
Please note that Nature Portfolio offers an immediate open access option only for papers that were first submitted after 1 January, 2021.
If you have any questions about our publishing options, costs, Open Access requirements, or our legal forms, please contact ASJournals@springernature.com If you have posted a preprint on any preprint server, please ensure that the preprint details are updated with a publication reference, including the DOI and a URL to the published version of the article on the journal website.
To assist our authors in disseminating their research to the broader community, our SharedIt initiative provides you with a unique shareable link that will allow anyone (with or without a subscription) to read the published article. Recipients of the link with a subscription will also be able to download and print the PDF.
As soon as your article is published, you will receive an automated email with your shareable link.
You can now use a single sign-on for all your accounts, view the status of all your manuscript submissions and reviews, access usage statistics for your published articles and download a record of your refereeing activity for the Nature journals.
An online order form for reprints of your paper is available at <a href="https://www.nature.com/reprints/authorreprints.html">https://www.nature.com/reprints/author-reprints.html</a>. Please let your coauthors and your institutions' public affairs office know that they are also welcome to order reprints by this method.
If you have not already done so, we invite you to upload the step-by-step protocols used in this manuscript to the Protocols Exchange, part of our on-line web resource, natureprotocols.com. If you complete the upload by the time you receive your manuscript proofs, we can insert links in your article that lead directly to the protocol details. Your protocol will be made freely available upon publication of your paper. By participating in natureprotocols.com, you are enabling researchers to more readily reproduce or adapt the methodology you use. Natureprotocols.com is fully searchable, providing your protocols and paper with increased utility and visibility. Please submit your protocol to https://protocolexchange.researchsquare.com/. After entering your nature.com username and password you will need to enter your manuscript number (NG-A61332R1). Further information can be found at https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards#protocols

Sincerely, Wei
Wei Li, PhD Senior Editor Nature Genetics New York, NY 10004, USA www.nature.com/ng